Goto

Collaborating Authors

 West Bengal


a7c4163b33286261b24c72fd3d1707c9-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing Systems

These datasets enable large-scale study of abuse detection for these languages. Anonymized comments: To further address privacy concerns, we anonymize our dataset. We combine thehate and offensivecategories in these datasets for training a binary classification model. We showthepercentage (%)ofemoticons present inourdatasetMACDinTable12. Infuture work,we will investigate in detail about the impact of emoticons on abuse detection. However,duetothe limited scale and diversity of abuse detection datasets in Indic languages, development of these models for Indic languages has been severely impeded.





SAFEWORLD: Geo-DiverseSafetyAlignment

Neural Information Processing Systems

Despite significant progress inthisarea, anessential factor often remains overlooked:geo-diversity. Recognizing and incorporating geographical variations [41, 40, 4, 10, 31, 6] in safety principles is crucial in the global landscape of LLM safety. Cultural norms and legal frameworks vary widely, resulting in diverse definitions of safe and acceptable behavior. As shown in Figure 1, while giving a green hatasagift might bebenign inmanycultures, itisconsidered offensiveinChina.